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Abstract 

Distributed Deniai of Service (DDoS) is a type of attack using the voiume, intensity, and more 
costs mitigation to increase in this era. Attackers used many zombie computers to exhaust the resources 
avaiiabie to a network, appiication or sen/ice so that authorized users cannot gain access or the network 
service is down, and it is a great ioss for Internet users in computer networks affected by DDoS attacks. 
This research proposed to develop a new approach to detect DDoS attacks based on network traffic 
activity were statistically analyzed using Gaussian Naive Bayes method. Data will be extracted from 
training and testing of network traffic in a core router at Master of Information Technology Research 
Laboratory Ahmad Dahlan University Yogyakarta (MITRLADUY). The new approach in detecting DDoS 
attacks is expected to be a relation with Intrusion Detection System (IDS) to predict the existence of DDoS 
attacks based on average and standard deviation of network packets in accordance with the Gaussian 
method. 
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1. Introduction 

Cisco 2016 Annual Security Report [1] showed DoS attacks still top the External 
Challenges Faced. Respondent does not Consider any of these to be challenges in the 
organization with 3 %, Zero-Day Attacks and Brute Force with 35 %. Whereas DoS leads the 
chart of the External Challenges Faced with 38 %, ahead Advanced Persistent Threats 43 %, 
Phishing 54 %, and Malware 68 %. It shows that DDoS attacks are still interesting to 
investigate. Figure 1 shows the Cisco statistic about External Challenges Faced. 


Malware 68% | 

Phishing 54% | 

Advanced Persistent Threats 43% | 

Denial of Service Attacks 38% | 

Brute-Force Attacks 35% | 

Zero-Day Attacks 35% | 

I Do Not Consider 

Any of These to Be Challenges 3% | 

for My Organization 


Source: Security Risk and Trustworthiness Study, Cisco 


Figure 1. External challenges faced 


Jasreena Kaur Bains et al [2] propound model used Naive Bayes Classifier with K2 
Learning process on reduced NSL KDD data set for each attack class. In this method, every 
layer is individually trained to detect a single type of attack category and the outcome of one 
layer to increase the detection rate and for better categorization of both the majority and 
minority attacks. 
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Mangesh D. Salunke and Ruhi Kabra [3] used Naive Bayes Classifier and Artificial 
Neural Network to detect the DDoS attack. They are proposed system is divided into two 
modules. First Training Set Generation, create a database for future use. All incoming packets 
are going through each level of training set generation and create dynamic data set and mark 
the incoming packet as “OK packet” or “Attack”. And then the second module is Real-Time 
Layered Intrusion Detection System, applying K-means clustering and Naive Bayes algorithm 
for data mining and classification algorithm to classify the attack as SYN flood, PING flood, 
UDP flood. 

Kanagalakshmi R. and Naveenantony Raj [4] used Hidden Naive Bayes Multiclass 
Classifier Model on network Intrusion Detection System for struggling progressively 
sophisticated network attacks. Bharti Nagpal et al [5] comparing many software DDoS attacks 
that showing complete information of the software DDoS attacks. V. Hema and C. Emily Shin [6] 
used Naive Bayes to calculate DoS attacks comparing Detection Rate and rate False Positive 
with the achievements by the proposed and existing systems. Mangesh Salunke at al [7] used 
K-means Clustering and Naive Bayes to classifying DDoS attacks with number of the malicious 
packets correctly classified as malicious in True Positive (TP), number of normal traffic falsely 
classified as malicious in False Positive (FP), when the malicious traffic is classified as normal 
traffic in False Negative (FN), and number of benign packets correctly classified as benign in 
True Negative (TN), 

Gnanapriya N. And Karthik R. [8] showed superiority Naive Bayes detection method 
compared with Information Gain and Gain Ratio. Niharika Sharma et al [9] used K-Means 
Clustering, Naive Bayes, Neural Network, Fuzzy Logic, and Genetic Algorithm to review 
anomaly DoS attacks to IDS by language Machine Learning. Mohammed Alkasassbeh et al [10] 
incorporates three well-known classification techniques: Multi-Layer Perceptron (MLP), Bayes, 
and Random Forest. That show MLP achieved is the highest accuracy. S.H.C. Haris et al [11] 
had been observed and analyzed in IP header for the first experiment. There are five main fields 
that are important in order to detect threats. The experiment is focusing on Internet Protocol 
Version 4 (IPV4), so the IPV must be 4 and IP header length must be equal or above than 20 
bytes and equal or below than 60 bytes. 

Nikhil S. Mangrulkar et al [12] proposed Intrusion Detection System and Intrusion 
Prevention System using Naive Bayes Classifier. Preprocessing part. Classification part, and 
Protection part became part of the principal. Majed Tabash and Tawfiq Bathroom [13] compared 
many methods to detecting and preventing the network from DoS attacks. The methods are K- 
NN with K=3, Decision Tree, SVM, and Naive Bayes. Navdeep Singh et al [14] compared 
Analysis of different DDoS detection Technique with Statistical Method, IDS, IDS based 
Dempster-Shafer Theory, Packet information Gathering and Preprocessing, Network Detection, 
and Real-time Detection System. Primula Is Armani and Imam Radi [15] used K-Means 
Clustering to classify DoS attacks in three danger levels Low, Medium, and High. 

Previous papers describe different techniques to detect DoS/DDoS attacks. We 
proposed to conduct further research on network testing, network processing, and analytical 
methods to achieve better detection accuracy with different network traffic and make the 
detection systems based on average and standard deviation according to the Gaussian method. 

2. Basic Theory 

2.1. Three-way Handshake 

Communication between computers requires a standard protocol called a three-way 
handshake as seen in Figure 2. The communication contains a protocol exchange between the 
server and the attacker [16]. 

Three-way handshake from a normal TCP connection initiates transmission from an 
attacker by sending SYN to the server, and the server will allocate a buffer to the user and reply 
with SYN and ACK packet. This stage, the connection is in a half-open state, waiting for an ACK 
response from the attacker to complete the connection settings. When the connection is 
completed, this is called three-way handshake. But TCP SYN Flood attacks manipulate this 
three-way handshake by making server busy with a SYN request. TCP SYN Flood is a common 
form of Denial of Service attack. TCP SYN Flood can be observed with a Packet Capture 
application by using a spanned link to observe a copy of server activity. TCP SYN Flood 
features are often the emergence of the incoming IP Address to the server. IP Address that 
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always appear to the server is calculated within a certain time range and used as feature 
extraction as a DDoS attack [11]. 


Client Server 



Figure 2. Three way handshake 


2.2. Gaussian Distribution 

The Gaussian distribution is one of the common and important methods in probability 
calculation and statistics, introduced by Gauss in his study of error theory [17], Gauss uses it to 
describe errors. Experience shows that many random variables, the height of adult males, and 
reaction time in psychological experiments, all of which can be solved by the Gaussian 
distribution [18]. The Gaussian distribution is: 

7755^252 ( 1 ) 


where, p is average and 5 is standard deviation, to calculate p and 5 values for numerical 
attributes using formula 


^2 _ 

n-1 


(3) 


2.3. Naive Bayes Algorithm 

Bayes theorem is stated mathematically as the following equation [2] 


(4) 


where: A and B are events 

P {A) and P (6) are the probabilities of A and B independent of each other 
P {A\B), a conditional probability, is the probability of A given that B is true 
P {Et\A), is the probability of B given that A is true 

Based on Gaussian distribution and Naive Bayes algorithm, we proposed to calculate 
incoming IP Address and packet length using packet capture application as numerical input to 
be displayed in the 2-dimensional graph. After obtaining numerical input, the data is processed 
using Gaussian Naive Bayes method based on a calculation of average and standard deviation. 
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3. Proposed Methodology 
3.1. Architecture Diagram 

Figure 3 Architecture Diagram, where the client is a router in MITRLADUY and 
connected to an investigator with a spanned link. 

a. Capture IP Packets 

Network traffics is collected using packet capture application. The input packets traffic is 
resized for further operation. 

b. Analyzing the IP/Data Packet 

IP packets are processed and analyzed to find out the normal network traffic and in 
case of under attack 

c. Features Extraction 

In the stage where implemented data processing which originally shaped log file into a 
form that can be processed further, so that the data can be retrieved from the important 
information. 

d. Training Set Formation 

Training set formation is a stage where the training implemented with naive Bayes 
method of log data file result processing to recognize the pattern of attacks 

e. Apply Gaussian Naive Bayes (Classify) 

Classify is the stage where the results of the training, tested with data test to know the 
success of introduction of DDoS attacks 

f. Classification Predictor 

The system provides output in the form of a normal traffic or under attack. The system 
will produce the output of an attacker’s IP address 


INTERNET 


CLIENT 




Figure 3. Architecture diagram of proposed research on MITRLADUY 


3.2. Topology 

Figure 4 shows the distribution of Computer network in MITRLADUY, that router serves 
8 users to access the internet. In this proposed research, DDoS attacks will be captured by the 
investigator as data input through the spanned link. Simulations were done by 6 attackers from 
outside the laboratory to the victim IP Address 172.10.64.250 using DDoS application Low Orbit 
Ion Cannon (LOIC) and Wireshark packet capture application to observe network traffic activity. 
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Zombie 


Computer 1 
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Figure 4. Network topology 


3.3. Classification 

Numerical data input will be processed by Gaussian Naive Bayes Classifier and give 
decision result in the form of normal access or attack. Figure 5 shows the proposed 
classification process using the Gaussian Naive Bayes method. 


INPUT 

^ IN COMINGS 
ADDRESS .y 


ACKET LENGTH 






GAUSSIAN 
NAIVE BAYES 
CLASSIFIER 


OUTPUT 



Figure 5. Classification process 


4. Result and Discussion 

Attack scenario is done in accordance with the topology in Figure 4, namely IP address 
172.10.64.199, 172.10.85.151, 172.10.71.29, 172.10.71.49, 172.10.201.5, and 172.10.201.19 
perform DDoS attacks using LOIC application to the victim with IP address 172.10.64.250. 
Attacks are performed within 3 minutes and captured using packet capture application. 

4.1. Input Parameters 

Packet capture determines network traffic activity data in form of time range, incoming 
IP address packet length, and so on. We observe network traffic within 3 minutes caused by 
storage limit and calculate the incoming IP address and packet length to serve as input 
parameters of numerical data that can be visualized in the classification process. Figure 6 
shows the capture result for the input parameter. 
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No. Time Source 
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TCP 101 8 0 27534 [ACK] Seq=3 
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Figure 6. Input parameters 


Packet capture determines network traffic activity data in form of time range, incoming 
IP address packet length, and so on. We observe network traffic within 3 minutes caused by 
storage limit and calculate the incoming IP address and packet length to serve as input 
parameters of numerical data that can be visualized in the classification process. Figure 6 
shows the capture result for the input parameter. 

Table 1 shows the results of calculating incoming IP addresses and packet lengths to 
be the coordinate point values on the x-axis and y-axis in 3 minutes time range. 


Table 1. MITRLADUY Network Traffic 3 minutes Time Range 


IP address 

Incoming IP (IIP) in 
time range (x-axis) 

Packet length (PL) in 
time range (y-axis) 

Access 

Time Range 
(minutes) 

192.168.10.2 

36 

10048 

Normal 

3 

192.168.10.3 

2449 

384809 

Normal 

3 

192.168.10.4 

786 

111132 

Normal 

3 

192.168.10.5 

1003 

140340 

Normal 

3 

192.168.10.6 

1174 

160075 

Normal 

3 

192.168.10.7 

1118 

148707 

Normal 

3 

192.168.10.8 

1200 

161525 

Normal 

3 

192.168.10.9 

1606 

226306 

Normal 

3 

172.10.64.199 

4515 

1527368 

Attack 

3 

172.10.85.151 

14320 

2522499 

Attack 

3 

172.10.201.5 

9811 

2044024 

Attack 

3 

172.10.201.19 

4407 

1014119 

Attack 

3 

172.10.71.29 

8338 

2356408 

Attack 

3 

172.10.71.49 

11206 

1694624 

Attack 

3 


4.2. Current Classification with Naive Bayes in Matlab 

We use Matlab software for classification process because this software is very familiar 
for users and also very good for displaying the graph. 

Matlab provides Naive Bayes classification facility for data processing. Network traffic 
can also use this facility to know the class. In the process of classification, using K, L, Q, and 
also function f. It will make difficult for many people to understand. Figure 7 shows the Matlab 
code of Naive Bayes classification with many coefficients. 
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- * Editor - D:\S2 MTI\Tesis\tesis2\ujian tesis\matlab\klasifikasikujadileei.i 




File 

Edit Text Go Cell Tools Debug Desktop Window Help 


X 

: D a ■ 1 

^ % 

• 1 ^ ® 4. ife) • ® 5g) C 

1 


-►1^ r 

• m 

i i - 

1.0 

+ -r 'l.l 1 X 1 


B — 


YJ = 

me^ngria ^nhspace (U, zUuUU; , nnspace ^u , ouuuuuu; > ; < 


□1 

9 - 

X = 

= XC) 

; Y = Y ( : ) ; 



10 - 

[C, err, P, logp, coef f 1 = classify ([X Y] , [PL PP] , Jcaregori, ' quadratic ') ; 



11 - 

hold on; 




12 - 

gscatter(X,Y,C,'gr','o',l,'off'); 



13 - 

K = 

= coeff(1,2).const; 



14 - 

L = coeff(1,2).linear; 



15 - 

Q = 

= coeff(1,2).quadratic; 



16 - 

f = 

= sprintf ( ' 0 = %g-(-%g''x-t-%g>*y-i-%g"x''2+%g’'x . ''y-(-%g*y . ^^2 ' , K, L, Q (1,1) , Q (1 



17 - 

h2 

= ezplot(f,[0 20000 0 3000000]); 

- 


18 - 

set(h2,■ 

'Color' , 'b * , 'LineWidth' , 2) 



19 - 

axis([0 

20000 0 3000000]) 



20 - 

xlaloel ( ' Incoming IP Address ' ) 



21 - 

ylaJoel (' Packet Length') 



22 - 

title ('{Xbf MITRLADUY NETWORK TRAFFIC CLASSIFICATION/') 

-- 


P' 



rn 1 > 




klasifikasikujadil.m x |f klaTifrk^sikujadile... ^ 


\ script 


Figure 7. Naive Bayes code in Matlab 


The result of network traffic classification is shown in Figure 8, the normal class set is 
limited by the quadratic curve on the blue line with the green circle set member. The other is the 
attack class with a red square set member. 



Figure 8. Result of classification in Matlab facility 


4.3. Proposed Classification with Gaussian Naive Bayes in Matlab 

Based on table 1, the formulas (2), and (3), we can calculate the average and standard 
deviation of the normal class and attack class. The average and standard deviation are: 


Incoming IP’s average of normal class =1172 

Incoming IP’s average of attack class =8766 

Packet length’s average of normal class =167868 

Packet length’s average of attack class =1859840 

Incoming IP’s standard deviation of normal class =686 

Incoming IP’s standard deviation of attack class =3877 

Packet length’s standard deviation of normal class =106791 

Packet length’s standard deviation of attack class =560838 
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The set of each class is based on the average and match standard deviation |J+(x5) to 
get the best accuracy, so the set can shelter its members. Average of each class to be the 
center of the set and coupled with the match standard deviation that states the extent of the set 
of each class. We have calculated the match standard deviation to get the best accuracy with 
the set area p+(35) for the normal class and p+(2,55) for the attack class. Figure 9 shows the 
set of normal classes and set of attack classes based on average and standard deviation. 



Figure 9. Gaussian Naive Bayes classification based on the average and standard deviation 


Finally, formula (1) and formula (4) are used to predict network traffic activity so that it 
can be known IP address that attacks to the victim. Table 2 shows the IP addresses that 
perform normal activities and IP addresses that perform attacks. 

Table 2 also shows the IP addresses that perform normal activities are 192.168.10.2, 
192.168.10.3, 192.168.10.4, 192.168.10.5, 192.168.10.6, 192.168.10.7. 192.168.10.8, and 
192.168.10.9. IP addresses that perform the attack activities are 172.10.64.199, 172.10.85.151, 
172.10.201.5, 172.10.201.19, 172.10.71.29, and 172.10.71.49. 


Table 2. Network Traffic Prediction of MITRLADUY Activities 


No 

IP Address 

Incoming 

IP (IIP) in 
time range 
(x axis) 

Packet 
length (PL) 
in time range 
(y axis) 

Access 

P(ncrmal|IP) 

>< 

P(attack|IP) 

CLASS 

1 

192.168.10.2 

36 

10048 

NORMAL 

1.4688E-09 

> 

1.96172E-11 

NORMAL 

2 

192.168.10.3 

2449 

384809 

NORMAL 

1.2666E-09 

> 

3.26734E-11 

NORMAL 

3 

192.168.10.4 

786 

111132 

NORMAL 

1.8679E-09 

> 

2.3003E-11 

NORMAL 

4 

192.168.10.5 

1003 

140340 

NORMAL 

1.9175E-09 

> 

2.40365E-11 

NORMAL 

5 

192.168.10.6 

1174 

160075 

NORMAL 

1.9305E-09 

> 

2.47967E-11 

NORMAL 

6 

192.168.10.7 

1118 

148707 

NORMAL 

1.927E-09 

> 

2.4442E-11 

NORMAL 

7 

192.168.10.8 

1200 

161525 

NORMAL 

1.9306E-09 

> 

2.48799E-11 

NORMAL 

8 

192.168.10.9 

1606 

226306 

NORMAL 

1.8575E-09 

> 

2.71337E-11 

NORMAL 

9 

172.10.64.199 

4515 

1527368 

ATTACK 

6.3477E-14 

< 

6.20552E-11 

ATTACK 

10 

172.10.85.151 

14320 

2522499 

ATTACK 

4.9321 E-30 

< 

5.33277E-11 

ATTACK 

11 

172.10.201.5 

9811 

2044024 

ATTACK 

1.029E-20 

< 

6.92607E-11 

ATTACK 

12 

172.10.201.19 

4407 

1014119 

ATTACK 

1.7146E-11 

< 

5.29461 E-11 

ATTACK 

13 

172.10.71.29 

8338 

2356408 

ATTACK 

3.309E-22 

< 

6.59322E-11 

ATTACK 

14 

172.10.71.49 

11206 

1694624 

ATTACK 

1.5572E-19 

< 

6.76054E-11 

ATTACK 
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5. Conclusion and Future Work 

The average and standard deviation can be used as a reference to create a set of 
classes using Gaussian Naive Bayes method. The average indicates the center of the class set, 
whereas the standard deviation shows the extent of the class set. The width of the set of each 
class is more specific to all members of the set. The Gaussian Naive Bayes method can also 
predict accurately and precisely. Further research is expected to process more data so that it 
can test the accuracy of the Gaussian Naive Bayes method. 
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